5. Main Analysis
It is commonly thought that higher APM comes naturally through becoming more skilled at SC2. In the SC2 community, it is believed that there is a rough heirarchy of skills that one should acquire in order to become skilled. The variables corresponding to these skills, ranked in approximate descending order of importance, are the following:
- WorkersMade
- MinimapAttacks / MinimapRightClicks
- AssignToHotkeys / SelectByHotkeys
- TotalMapExplored
- UniqueUnitsMade
- ComplexAbilitiesUsed / ComplexUnitsMade
- UniqueHotkeys
It should be noted that these variables are rates expressed in “timestamps”, which is the way that the game calculates time. There are approximately 88.5 timestamps per second according to the dataset documentation. Since such a short measure of time creates very small values, I will express these variables in minutes to make the data and graphs easier to understand. Throughout this section, I encountered outliers for several variables that made it difficult to read the graphs. I dealt with these outlier values by removing them from the dataset. I made this decision due to these data points comprising an extremely small portion of the dataset. Also, since most of these metrics are more obscure and not discussed often in the community, it would be difficult to decide on a “reasonable” ceiling value to restrict these variables to.
In addition, I decided to omit the MaxTimeStamp variable from my analysis. This variable indicates the length of each game. While there may be some correlation between how skilled a player is and his or her average game length, each player’s choice in strategy has a major effect on how long a particular game will be. For example, before a game begins, a player can decide to invest significant resoures early on into doing a quick surprise attack to catch the opponent off guard. If such an attack fails, then the attacking player will be in a vulnerable position and may be defeated by the opponent’s counterattack. In this case, the game is likely to end early regardless of the outcome. On the other hand, both players can just as easily decide to focus on their own army and building production while avoiding conflicts early on. Such a game would take much longer to finish. Since game length is largely affected by conscious choices that players make in addition to behavior derived from skill level, it would be difficult to isolate these two factors in orde to assess MaxTimeStamp as an indicator of skill level.
1) WorkersMade
The typical answer to the question of how one gets better at SC2 is to make more workers. This might seem surprisingly at first. Worker units are primarily responsible for collecting resources and constructing buildings. This might sound quite mundate, but collecting enough resources is of utmost importance, since this is the currency with which players build their army and bases. Collecting resources at a suboptimal rate due to having too few workers will create a bottleneck that will slow the rate at which your army can be produced. Additionally, workers can be serve as scouts or sacrificial pawns during attacks and defenses. People often tell novices that they can get promoted out of Bronze League with this simple strategy:constantly produce workers, collect lots of resources, build an army with those resources (the type of units built doesn’t really matter), and send that army straight to the enemy base.
These humble workers are busy collecting resources inside this player’s main base.
This graph showing the number of workers each player possesses at any given time is just one of several pages of detailed statistics that are presented to players at the conclusion of each game.
The violin plot below shows that there is a slightly positive correlation between how workers are produced per minute and the rank of the player. By looking at the median lines, one can see a slight downward trend going from Master to GrandMaster. A more pronounced increase can be seen when looking at the outlier values from Diamond to GrandMaster. This demonstrates that one should find the proper balance between making too few workers (collecting resources at a suboptimal rate) and making too many (wasting money by producing unneeded workers who have nothing to do). On the other end, the transition from Bronze to Silver results in the median increasing from 2.934 to 3.84, which is more than the transition between any other pairs of adjacent leagues. This supports the claim that players can be promoted out of Bronze League simply by focusing on improving their worker production habits. Meanwhile, worker production rates level out at the higher leagues. The outlier values in Diamond league could be a sign of players producing more workers than needed, which would be an inefficient use of resources.
ggplot(sc, aes(sc$League, sc$WorkersMade*88.5*60)) + geom_violin(fill = 'purple', alpha = 0.5, draw_quantiles = c(0.5)) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Workers Produced per Minute') + scale_y_continuous(breaks = c(seq(0,30, 2)))
#Display summary statistics for "WorkersMade" for each league
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$WorkersMade*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4089 2.1570 2.9340 3.3260 4.0100 10.9400
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.8776 2.8510 3.8400 4.2540 5.0620 14.7200
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.508 3.323 4.292 4.871 6.006 19.840
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.284 3.616 4.689 5.344 6.406 18.850
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.482 4.207 5.391 6.188 7.469 27.340
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.496 4.372 5.628 6.427 7.673 21.880
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.570 3.777 5.333 6.413 7.267 16.940
2) MinimapAttacks / MinimapRightClicks
The second most important metric might be a surprising choice as well. The minimap is a map of the battlefield located on the bottom left corner of the user interface (as shown in the first gameplay screenshot in the introduction section). At a glance, players can see what is happening wherever he has an army presence. By clicking on the minimap, players can instantly switch views to the corresponding location. More importantly, he or she can order units to move to or attack that location. The alternative is for the player to manually scroll to that location with the mouse,which could take much longer depending on how big the battlefield is and where the playter was looking prior to scrolling. With the minimap, players can observe and react to situations occurring in the game faster and more efficiently. This difference can be compared to the difference between editing a long document with ability to jump to different sections by clicking or scrolling with a mouse versus having to manually scroll through the text using the arrow keys on the keyboard.
This is the interface through which players interact with their bases and armies in Starcraft II. The minimap is positioned at the bottom left corner.
ggplot(sc, aes(sc$League, sc$MinimapAttacks*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Minimap Attacks Issued per Minute')
ggplot(sc, aes(sc$League, sc$MinimapRightClicks*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Minimap Right Clicks Performed per Minute')
There are a few outlier data points that are obscuring the view of the league distributions for MinimapAttacks (17/3395) and MinimapRightClicks (7/3395). After removing these points, the distributions can be more clearly seen:
#Remove the "MinimapAttacks" outlier values and convert the values in terms of minutes
sc <- filter(sc, sc$MinimapAttacks <= 0.001003)
ggplot(sc, aes(sc$League, sc$MinimapAttacks*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Minimap Attacks Issued per Minute') + scale_y_continuous(breaks = c(seq(0,7, 0.25)))
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$MinimapAttacks*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1502 0.1519 1.3840
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.07434 0.23670 0.24000 4.66900
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.1131 0.2838 0.3807 2.9940
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.1845 0.3974 0.5104 4.7190
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.07301 0.30320 0.55590 0.69710 4.78700
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1670 0.4848 0.7678 1.0520 5.3250
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.05161 0.71670 1.27500 1.55500 1.95200 4.73800
#Remove the "MinimapRightClicks" outlier values and convert the values in terms of minutes
sc <- filter(sc, sc$MinimapRightClicks <= 0.0028)
ggplot(sc, aes(sc$League, sc$MinimapRightClicks*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Minimap Right Clicks Performed per Minute') + scale_y_continuous(breaks = c(seq(0, 15, 1)))
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$MinimapRightClicks*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.3454 0.8103 1.1050 1.6140 7.4630
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.4721 1.0410 1.4630 2.0310 10.7900
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 0.666 1.271 1.745 2.342 12.070
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.7308 1.4410 1.9160 2.5050 10.3900
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.8876 1.6690 2.2530 3.1530 11.4100
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.005 1.810 2.459 3.384 13.270
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1333 1.6350 2.7240 2.9810 3.8270 8.6670
In both cases, there’s a positive correlation that appears to increase with the leagues. For MinimapAttacks, the slope of the LOESS line increases at the upper leagues. The large increase in the median between Master and GrandMaster (0.4867 to 1.277) shows the importance of using the minimap in attaining the highest level of skill. On the other end, the median also changes significantly from Bronze to Silver (o to 0.07434), which supports the belief that at least some minimap usage can be enough to be promoted out of Bronze League. It seems that many players from Bronze all the way to Platinum League don’t issue attack commands through the minimap or do so very rarely. For MinimapRightClicks, the variation between leagues isn’t as pronounced, with a roughly linear LOESS line. In the lower leagues, the median value of 0.8103 for Bronze League indicates that this is an action that most novice players already know to do. Therefore they just need to learn to do so more often.
3) AssignToHotkeys and SelectByHotkeys
It could be argued that learning to use hotkeys is equally essential to improving at SC2. Hotkeys are the computer game equivalent of keyboard shortcuts, and by not using them, one would be reduced to playing the game with only a mouse. To give an idea of how inefficient and time-consuming that would be, imagine having to type by using a mouse to click each letter on a virutal keyboard at the bottom of the computer screen. This analogy is not far from the truth since most of the buttons corresponding to actions in the game are located at the bottom of the user interface.
Each possible command in the game has an associated default hotkey. In addition, players can assign specific groups of units and buildings to one of the number keys. Therefore, by pressing a number key, a player can immediately select several units or buildings simultaneously, no matter where they are on the map or how far apart they are, and issue commands to them. The variable AssignToHotkeys indicates how often a player makes these assignments, whereas SelectByHotkeys indicates how often a player controls these previously assigned groups using the hotkeys.
ggplot(sc, aes(sc$League, sc$AssignToHotkeys*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Hotkey Assignments Made per Minute')
#Remove outlier values for the "AssignToHotkeys" variable
sc <- filter(sc, sc$AssignToHotkeys*88.5*60 <= 6.5)
ggplot(sc, aes(sc$League, sc$AssignToHotkeys*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.5, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Hotkey Assignments Made per Minute') + scale_y_continuous(breaks = c(seq(0,7, 0.5)))
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$AssignToHotkeys*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.4811 0.8097 0.9849 1.3340 3.5450
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.5614 0.9851 1.1790 1.7030 3.6620
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.8148 1.3810 1.4990 2.0780 4.8760
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1237 1.0910 1.7520 1.7860 2.3820 5.3920
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.1651 1.5140 2.1630 2.1910 2.7800 6.2370
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.3452 1.9320 2.6080 2.6890 3.3150 6.3600
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.645 2.493 3.529 3.592 4.553 6.479
ggplot(sc, aes(sc$League, sc$SelectByHotkeys*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Hotkey Groups Selected per Minute') + scale_y_continuous(breaks = c(seq(0,200, 25)))
After removing those outliers, there are still some data points with outlier values for the SelectByHotkeys variable (32/3395) that should be removed as well.
#Remove outlier values for the "SelectByHotkeys" variable
sc <- filter(sc, sc$SelectByHotkeys*88.5*60 < 150)
ggplot(sc, aes(sc$League, sc$SelectByHotkeys*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.2, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Hotkey Groups Selected per Minute') + scale_y_continuous(breaks = c(seq(0,200, 25)))
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$SelectByHotkeys*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 1.558 3.566 5.741 7.274 74.270
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 3.083 5.813 8.154 9.809 74.290
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 4.692 8.201 11.600 13.500 104.000
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 6.661 11.490 16.500 19.900 124.400
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2825 11.0700 18.2200 24.9900 31.9400 147.0000
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.6472 15.9500 28.2200 36.1500 48.3100 148.7000
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 14.30 30.07 43.55 50.58 65.27 142.20
In Bronze and Silver League, the median and mean values for AssignToHotkeys are close to 1. From my experience with the game, this reflects the habit of assigning one’s entire army to one hotkey and updating the assignment by adding newly produced units to that hotkey group. Players who control their entire army as a single large mass mostly do so because they lack the multitasking skills to effectively control multiple groups at a time. However, this setup makes it difficult for them to split their units up to deal with threats occurring simultaneously in different places. The median and mean values for SelectByHotkeys in the lower leagues are similarly low due to the fact that there’s no need to make too many hotkey selections when your entire army is assigned to only one or two hotkey groups. This type of habit seems to disappear as one progresses through the leagues. Comparing the lower leagues to the higher ones indicates that these variables are more important in being promoted from the middle leagues. The differences between the Diamond, Master, and GrandMaster League distributions are quite pronounced. This illustrates the importance of multitasking in SC2, and specifically, the importance of maintaining many hotkey groups and continuously selecting them to issue orders to attack and defend.
4) TotalMapExplored
The need for intelligence gathering during battles should be apparent. As a consequence of games happening under the fog of war, players have an incentive to constantly scout and observe the opposing player’s actions while hiding their own actions from enemy scouts. As previously mentioned, players can decide to perform sneak attacks early in a game to win the game quickly and decisively. Even in slower paced games, one should be aware of what kind of army the opponent is producing and where that army is moving. In SC2, each army unit has a “rock, paper, scissors” relationship with other types types of units, meaning that each unit is effective at fighting certain types of units while being ineffective against other types of units. Thus, by keeping watch of the opponent’s army, one can respond by producing the types of units that are effective against the opposing army.
In SC2, the battlefield is divided into squares that are about the size of the mouse pointer. The TotalMapExplored metric measures how many 24x24 grids of these squares a player has explored per timestamp. During the course of a game, players will naturally explore territory by virtue of expanding to new bases to collect more resources and sending their army to attack the opponent. However, a skilled player will go beyond this and intentionally send units to different parts of the map to confirm what the opponent is doing and to make sure that no sneak attacks are incoming. Timely information received from scouting can mean the difference between victory and defeat in many games.
ggplot(sc, aes(sc$League, sc$TotalMapExplored*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Number of 24x24 Map Grids Explored per Minute')+ scale_y_continuous(breaks = c(seq(0,5, 0.25)))
The league distribution medians seem to be increasing almost linearly, but I’d like to remove the few outlier values (2/3395) greater than 4.00 to make sure there isn’t a polynomial increase being obscured.
sc <- filter(sc, sc$TotalMapExplored < 0.00074)
ggplot(sc, aes(sc$League, sc$TotalMapExplored*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.2, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Number of 24x24 Map Grids Explored per Minute') + scale_y_continuous(breaks = c(seq(0,5, 0.25)))
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$TotalMapExplored*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5726 0.9990 1.2220 1.3300 1.5570 3.6790
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5853 1.0320 1.3040 1.3620 1.5640 3.6100
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.4848 1.0880 1.3060 1.3750 1.5780 3.1050
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5469 1.1930 1.4040 1.4680 1.6680 3.5180
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.651 1.270 1.515 1.570 1.799 3.903
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.621 1.303 1.548 1.636 1.866 3.696
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9095 1.2530 1.6510 1.6190 1.8620 2.7500
There’s as relatively large increase in the median going from Bronze to Silver League which indicates that Silver League players leave their bases more often, though it’s unclear whether this is the result of purposeful scouting or just a consequence of being more active on the battlefield when performing other actions. Past Bronze League, it looks like the distributions are relatively until the middle, according to the LOESS line. Knowing to scout seems to be a a skill that’s acquired when transitioning from the Gold to Platinum Leagues. The transition from Master to GrandMaster League shows another relatively large median increase, which indicates that scouting is a crucial skill to acquire in order to reach the highest skill level.
5) UniqueUnitsMade
At first, I thought this variable measured the rate that players were producing units. This would be a measure of a player’s macromanagement and being able to efficiently spend his or her resources. However, upon converting the values to reflect minutes rather than timestamps, I saw that the values were much too low, considering that games typically last for 15-25 minutes and armies often consist of dozens of units.
ggplot(sc, aes(sc$League, sc$UniqueUnitsMade*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Unique Units Made per Minute')
These values would imply that on average, only a handful of units are produced per game, which is clearly not correct. I then tried multiplying this variable by MaxTimeStamp, which indicates how many timestamps each game contained, which would indicate the duration of each game. This should provide the total number of unique units produced per game.
ggplot(sc, aes(sc$League, sc$UniqueUnitsMade*sc$MaxTimeStamp)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Unique Units Made per Game') + scale_y_continuous(breaks = c(seq(0,15, 1)))
Notice that all of the values fall between 2 and 13. Given this range, it is most likely that this variable is the number of different types of units produced per game divided by the total length of the game considering that each player can only choose from at most 15 types of units to produce. A value of 2 would indicate that the player produced nothing but workers and the most basic, low-tech attacking unit. Given my new understanding of this variable, I decided that I would not use it any further during this project. While making a diverse army comprised of various types of units can be seen as a measure of how well a player can adapt to changing situations in the game by adjusting the composition of his or her army, it is also true that UniqueUnitsMade is naturally correlated with game length. This is due to the fact that the more advanced, hi-tech units cannot be produced early in the game due to their high costs and prerequisite infrastructure requirements (this idea is explained in more detail in the next section). As mentioned before, game length is in part determined by conscious decisions made by each player, so it would not be appropriate to interpret this metric as pure measure of skill.
6) ComplexAbilitiesUsed / ComplexUnitsMade
In SC2, there is a heirarchy in which all the different types of buildings and army units belong, called the “tech tree”. In the tech tree, buildings and units further up the tree can only be accessed once certain buildings further down in the tech tree have been constructed. This network of dependencies ensures that the inexpensive, low-tech buildings and units are accessible relatively early in the game, whereas the most powerful and expensive units are reserved for later in the game. Often, these units have devastating abilities that can be used repeatedly (with a predetermined recharge period in between uses) to devastate an opposing army if used correctly. This depends on the player manually clicking on the intended target in an accurate and timely manner. Otherwise, the attack misses its target and is wasted.
Several units have activated abilities that must be triggered at the right time or targeted abilities that require the player to directly click on the area or unit to be targeted.
ComplexAbilitiesUsed measures how often a player uses such targeted abilities, whereas ComplexUnitsMade measures the rate at which these advanced units are produced. Properly utilizing these units and abilities requires another layer of micromanagement on top of everything else that a player needs to attend to during a game. Therefore, one would think that higher skilled players will use them with greater frequency.
ggplot(sc, aes(sc$League, sc$ComplexAbilityUsed*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Targeted Abilities Used per Minute')
ggplot(sc, aes(sc$League, sc$ComplexUnitsMade*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Advanced Units Produced per Minute')
Again, the lower league distributions seem like they might be obscured due to outliers (99/3395 for ComplexAbilityUsed and 3/3395 for ComplexUnitsMade).
#Remove outlier values for "ComplexAbilityUsed" and convert the values in terms of minutes
sc <- filter(sc, sc$ComplexAbilityUsed < 0.00188)
ggplot(sc, aes(sc$League, sc$ComplexAbilityUsed*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.2, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Targeted Abilities Used per Minute')
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$ComplexAbilityUsed*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2217 0.2392 3.3450
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.4015 0.3045 9.3640
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.5606 0.6701 8.3970
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.1625 0.7236 0.9349 9.7880
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.3441 0.9119 1.2080 9.5040
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.2591 0.9162 1.3490 9.1170
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.5945 0.9948 4.3870
#Remove outlier values for "ComplexUnitsMade" and convert the values in terms of minutes
sc <- filter(sc, sc$ComplexUnitsMade < 0.00075)
ggplot(sc, aes(sc$League, sc$ComplexUnitsMade*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.2, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Advanced Units Produced per Minute')
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$ComplexUnitsMade*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00000 0.00000 0.00000 0.07784 0.00000 1.68800
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.1293 0.0000 2.6230
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.2338 0.1362 3.1330
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3398 0.5252 3.4910
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3976 0.6984 3.5930
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.4002 0.6635 3.2030
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.0000 0.0000 0.3373 0.4641 2.0490
It appears that the lower league distributions for these two variables truly do look that, meaning that players in Bronze or Silver League use these advanced units and their abilities extremely rarely. In fact, the median values of ComplexUnitsMade for each distribution is 0. When looking at the means for each league, there’s a positive correlation from Bronze to Master League followed by a slight decrease in GrandMaster League. Therefore, these seem to be additional examples of skills that are not particularly used in the lower leagues but are used much more often among more skilled players.
7) UniqueHotkeys
Originally, I had ranked this metric in second place on the skill heirarchy. From the variable name and description from the researchers, I thought that this variable measured how often players used different hotkeys to input commands. However, when I converted the variables in terms of minutes, the values were much too low to make sense when placed in the context of APM and the other hotkey variables. For example, how could a GrandMaster player be using less than one unique hotkey per minute when he is most likely performing over 150 actions per minute? Upon consulting with my friend, we came to the conclusion that this metric actually measures the rate at which hotkeys that the player changed from the default settings. For example, players can change certain hotkey settings so that commonly used commands are assigned to neighboring keys on the keyboard to facilitate the issuing of commands. Conversely, a player can also decide to reassign a little used command as to avoid accidentally pressing it during hectic periods in the game. Given that there’s a steep learning curve involved in transitioning from not using hotkeys at all, beginning to use them, committing them to memory so that one can use them quickly and comfortably, to finally understanding one’s needs and customizing his or her hotkey setup to match that need, I would expect that there wouldn’t be much of a difference in the lower league distributions for this metric, but there would be a pronounced difference in the upper league distributions.
ggplot(sc, aes(sc$League, sc$UniqueHotkeys*88.5*60)) + geom_boxplot(outlier.color = 'blue', fill = 'purple', alpha = 0.5) + geom_smooth(method = 'loess', se = FALSE, color = 'blue', size = 1.2, aes(group = 1)) + theme(panel.background = element_rect(fill = 'white'), panel.grid.major = element_line(color = 'gray')) + labs(x = 'League', y = 'Unique Hotkeys Used per Minute')
#Display summary statistics for "UniqueHotkeys" for each league
for(league in rank) {
print(league)
print(summary(filter(sc, sc$League == league)$UniqueHotkeys*88.5*60))
}
## [1] "Bronze"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1065 0.2267 0.2404 0.3398 1.2800
## [1] "Silver"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1099 0.2124 0.2386 0.3348 0.7454
## [1] "Gold"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1306 0.2283 0.2629 0.3637 1.7920
## [1] "Platinum"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.1561 0.2551 0.2775 0.3729 1.2480
## [1] "Diamond"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2023 0.2992 0.3368 0.4402 1.5580
## [1] "Master"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2627 0.3725 0.3939 0.4910 1.4970
## [1] "GrandMaster"
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2113 0.2942 0.4269 0.4350 0.5334 0.9168
The data seems to support my intuition. In the Lower Leagues, the LOESS line increases quite slowly, but suddenly increases in slow when transitioning from Platinum to Diamond League. It seems that this variable is not so important for less skilled player, but it’s a deciding factor for how skilled players are in the upper leagues.
Analysis of Each Metric with Respect to Time Variables
Having analyzed each of the relevant variables individually, I’d like to have a quick overview of the correlations between the various in-game metrics and each time variable.
library(corrplot)
library(viridis)
## Loading required package: viridisLite
#Visualize correlation coefficients between "Age" and all other variables
compare_age <- data.frame(c(sc[3], sc[6:11], sc[16:17], sc[19:20]))
corr_age <- cor(compare_age)
corrplot.mixed(corr_age, lower = 'number', upper = 'ellipse', col = viridis(256), title = 'Correlations with Player Age')
#Visualize correlation coefficients between "HoursPerWeek" and all other variables
compare_week <- data.frame(c(sc[4], sc[6:11], sc[16:17], sc[19:20]))
corr_week <- cor(compare_week)
corrplot.mixed(corr_week, lower = 'number', upper = 'ellipse', col = viridis(256), title = 'Correlations with Hours Played per Week')
#Visualize correlation coefficients between "TotalHours" and all other variables
compare_total <- data.frame(c(sc[5], sc[6:11], sc[16:17], sc[19:20]))
corr_total <- cor(compare_total)
corrplot.mixed(corr_total, lower = 'number', upper = 'ellipse', col = viridis(256), title = 'Correlations with Total Hours Played')